library(mosaic)
library(car)
library(plotly)
library(reshape2)
library(scatterplot3d)
library(mosaicData)
library(tidyverse)
library(lmtest)
library(readr)
Study Sections 8.1-8.6 – “Regression Models for Quantitative and Qualitative Predictors.”
Attempt and submit at least {60} Hard Work Points by Saturday at 11:59 PM.
Over {62} gets you {+1} Final Exam Point.
Over {66} gets you {+2} Final Exam Points.
These theory questions are to help you recognize the shape of four important regression models.
beta0 <- 6
beta1 <- -2
beta2 <- 3
par(mfrow=c(1,4), mai=c(0.1,0.1,0.8,0.1), cex.main=1.5)
curve(beta0 + beta1*x, xaxt='n', yaxt='n', main="SLR Model", ylab="", xlab="", lwd=3, col="darkgray")
curve(beta0 + beta1*x + beta2*x^2, xaxt='n', yaxt='n', main="Quadratic Model", ylab="", xlab="", lwd=3, col="darkgray")
curve(beta0 + beta1*x, xaxt='n', yaxt='n', main="Two-lines Model", ylab="", xlab="", lwd=3, col="darkgray")
curve((beta0+-2) + (beta1+3)*x, xaxt='n', yaxt='n', main="Two-lines Model", ylab="", xlab="", lwd=3, col="darkgray", lty=2, add=TRUE)
curve(beta0 + beta1*x + beta2*x^2, xaxt='n', yaxt='n', main="Double Quadratic \n Model", ylab="", xlab="", lwd=3, col="darkgray", ylim=c(5,8), xlim=c(0,1.5))
curve((beta0+1) + (beta1+4)*x + (beta2-5)*x^2, xaxt='n', yaxt='n', main="Double Quadratic Model", ylab="", xlab="", lwd=3, col="darkgray", lty=2, add=TRUE)
They will also help you explore the 3d regression model.
# 3D scatter plot
s3d <- scatterplot3d(trees, type = "h", color = "darkgray",
angle=55, pch = 16, xlab="", ylab="", zlab="", main="3d Model", tick.marks=FALSE)
# Add regression plane
my.lm <- lm(Volume ~ Girth + Height, data=trees)
s3d$plane3d(my.lm, col="gray", draw_polygon = TRUE)
The SLR Model is the Simple Linear Regression Model:
\[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \ \text{where} \ \epsilon_i \sim N(0,\sigma^2) \]
Data from this model is created by defining the \(X_i\), \(\beta_0\), \(\beta_1\), and \(\sigma\) and then sampling the \(\epsilon_i\) from a \(N(0,\sigma^2)\) distribution.
# Define the Xi:
X <- c(1.6, 5.6, 8.2, 9.7, 4.1, 6.7, 4.7, 4.1, 8.4, 7.2)
# Define the betas:
beta0 <- 3
beta1 <- 2.5
# Define sigma:
sigma <- 1.2
# Obtain a sample of the epsilon_i (10 of them) from a normal distribution with mean of 0 and standard deviation sigma:
set.seed(121) #Ensures the same "random" sample each time you knit.
epsilon <- rnorm(10, 0, sigma)
# Create the Yi using the model:
Y <- beta0 + beta1*X + epsilon
# Plot the data:
plot(Y ~ X, pch=16, col="darkgray")
# Plot the true regression equation with a dashed line:
curve(beta0 + beta1*x, add=TRUE, col="darkgray", lty=2)
# Obtain the fitted regression equation and plot:
slr_lm <- lm(Y ~ X)
abline(slr_lm, col="darkgray")
# Add a legend:
legend("topleft", bty="n", legend=c("True Line", "Fitted Line"), lty=c(2,1), col="gray", lwd=3)
Notice that a 95% confidence interval for \(\beta_0\) is \((0.92, 4.83)\) and for \(\beta_1\) is \((2.23, 2.84)\) which is impressive because \(\beta_0 = 3\) and \(\beta_1 = 2.5\). This shows the regression was quite good at uncovering the truth as both parameters fit inside their respective confidence intervals.
Show a scatterplot, true regression function \(E\{Y\}\), and fitted regression function \(\hat{Y}\), all on a single graphic, for data you fabricate using the simple linear regression model:
\[ Y_i = \beta_0 + \beta_1 X_{i} + \epsilon_i \quad \text{(SLR Model)} \]
where \(\epsilon_i \sim N(0,\sigma^2)\). Use the values of \(\beta_0 = 6\) and \(\beta_1 = -2\). Further, use the ten x-values provided in the code below.
X <- c(1.3, 1.8, 1.4, 2.1, 0.9, 1.5, 1.1, 0.5, 0.1, 0.2)
## Hint: rnorm(n, mean, standard deviation)
beta0 <- 8
beta1 <- -2
sigma <- 1.5
epsilon <- rnorm(10, 0, sigma)
Y <- beta0 + beta1*X + epsilon
one_lm <- lm(Y~X)
plot(Y ~ X, pch=16, col="blue")
curve(beta0 + beta1*x, add=TRUE, col="darkgray", lty=2)
abline(one_lm)
Provide confidence intervals for each of your coefficients and briefly write about how well your fitted model captured the true model.
one_lm$coefficients
## (Intercept) X
## 8.334002 -2.277504
confint(one_lm)
## 2.5 % 97.5 %
## (Intercept) 5.946596 10.7214083
## X -4.172629 -0.3823791
The true model and the fitted model don’t fit well at all.
How does your choice of \(\sigma\) impact the results of the regression? (You may have to change \(\sigma\) and re-run the code a few times to see what happens.)
The sigma shows how much the points deviate from the true line.
What effect does each coefficient \(\beta_0\) and \(\beta_1\) have on the regression model? Change each coefficient one at a time and re-run the code to get a feel for what happens.
When you change the betas it changes the slope of the line and the y intercept of the line.
Show a scatterplot, true regression function \(E\{Y\}\), and fitted regression function \(\hat{Y}\), all on a single graphic, for data you fabricate using the quadratic regression model:
\[ Y_i = \beta_0 + \beta_1 X_{i} + \beta_2 X_{i}^2 + \epsilon_i \quad \text{(Quadratic Model)} \]
where \(\epsilon_i \sim N(0,\sigma^2)\). Use the values of \(\beta_0 = 6\), \(\beta_1 = -2\) and \(\beta_2 = 3\). Further, use the ten x-values provided in the code below.
X <- c(1.3, 1.8, 1.4, 2.1, 0.9, 1.5, 1.1, 0.5, 0.1, 0.2)
## Hint: rnorm(n, mean, standard deviation)
beta_0 <- 8
beta_1 <- -6
beta_2 <- 4
sigma <- 9
epsilon <- rnorm(10, 0, sigma)
Y <- beta_0 + beta_1*X + beta_2*X^2 + epsilon
two_lm <- lm(Y~X + I(X^2))
plot(Y ~ X, pch=16, col="blue")
abline(one_lm)
curve(beta_0 + beta_1*x + beta_2*x^2, add=TRUE, col="darkgray", lty=2)
Provide confidence intervals for each of your coefficients and briefly write about how well your fitted model captured the true model.
two_lm$coefficients
## (Intercept) X I(X^2)
## -0.3215529 3.8244818 2.2745299
confint(two_lm)
## 2.5 % 97.5 %
## (Intercept) -12.38422 11.74112
## X -21.93227 29.58123
## I(X^2) -9.62004 14.16910
The bigger I made the sigma the better the fitted model captured the true model. It gave the cofidence interval a closer which less deviation.
How does your choice of \(\sigma\) impact the results of the regression? (You may have to change \(\sigma\) and re-run the code a few times to see what happens.)
The choice of sigma changes the rate at which the regression line tries to capture all the points. When the sigma was bigger the closer the fitted line was to the true line.
What effect does each coefficient \(\beta_0\), \(\beta_1\), and \(\beta_2\) have on the regression model? Change each coefficient one at a time and re-run the code to get a feel for what happens.
The change how close the two lines are to each other and how the regression line interacts with the data.
Show a scatterplot, true regression function \(E\{Y\}\), and fitted regression function \(\hat{Y}\), all on a single graphic, for data you fabricate using the two-lines regression model:
\[ Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \beta_3 X_{i1} X_{i2} + \epsilon_i \quad \text{(Two-lines Model)} \]
where \(\epsilon_i \sim N(0,\sigma^2)\). Use the values of \(\beta_0 = 6\), \(\beta_1 = -2\), \(\beta_2 = -2\), and \(\beta_3 = 4\). Further, use the ten x-value pairs provided in the code below.
X1 <- c(3.5, 2.8, 5.4, 8.1, 9.9, 4.5, 2.1, 2.5, 3.1, 9.2)
X2 <- c(0, 0, 0, 0, 1, 1, 1, 1, 1, 0)
beta_0 <- 6
beta_1 <- -2
beta_2 <- -2
beta_3 <- 4
sigma <- 1.2
epsilon <- rnorm(10, 0, sigma)
Y <- beta_0 + beta_1*X1 + beta_2*X2 + beta_3*X1*X2 + epsilon
three_lm <- lm(Y~ X1 + X2 + X1:X2)
b <- three_lm$coefficients
plot(Y ~ X1, pch=16, col=as.factor(X2))
#abline(three_lm)
curve(b[1] + b[2]*x, add=TRUE, col= "red", lty=2)
curve((b[1] + b[3]) + (b[2]+b[4])*x, add = TRUE, col = "blue", lty = 2)
Provide confidence intervals for each of your coefficients and briefly write about how well your fitted model captured the true model.
three_lm$coefficients
## (Intercept) X1 X2 X1:X2
## 7.125111 -2.179543 -2.260929 4.075398
confint(three_lm)
## 2.5 % 97.5 %
## (Intercept) 4.335680 9.914542
## X1 -2.621143 -1.737943
## X2 -5.713606 1.191748
## X1:X2 3.488509 4.662287
The confidence interval takes in 0 which makes me wonder how think about how useful the coefficients are. But the fitted model and the true model fit quite closely together.
How does your choice of \(\sigma\) impact the results of the regression? (You may have to change \(\sigma\) and re-run the code a few times to see what happens.)
My choice in sigma doesn’t affect the regression line.
What effect does each coefficient \(\beta_0\), \(\beta_1\), \(\beta_2\), and \(\beta_3\) have on the regression model? Change each coefficient one at a time and re-run the code to get a feel for what happens.
Starts to push the true line downwards and farther away from the fitted line.
Show a scatterplot, true regression function \(E\{Y\}\), and fitted regression function \(\hat{Y}\), all on a single graphic, for data you fabricate using the double quadratic regression model:
\[ Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i1}^2 + \quad \\ \quad \beta_3 X_{i2} + \beta_4 X_{i1}X_{i2} + \beta_5 X_{i1}^2X_{i2} + \epsilon_i \\ \quad \quad \quad \quad \quad \quad\text{(Double Quadratic Model)} \]
where \(\epsilon_i \sim N(0,\sigma^2)\). Use the values of \(\beta_0 = 6\), \(\beta_1 = -2\), \(\beta_2 = 3\), \(\beta_3 = 1\), \(\beta_4 = 4\), and \(\beta_5 = -5\). Further, use the ten x-value pairs provided in the code below.
beta0 <- 8
beta1 <- -4
beta2 <- 6
beta3 <- 4
beta4 <- 4
beta5 <- -5
sigma <- 1.2
epsilon <- rnorm(10, 0, sigma)
X1 <- c(1.15, 0.67, 0.05, 1.05, 0.38, 0.94, 0.40, 0.80, 0.70, 0.86)
X2 <- c(0, 0, 0, 0, 1, 1, 1, 1, 1, 0)
Y <- beta0 + beta1*X1+beta2*X1^2+beta3*X2+beta4*X1*X2+beta5*X1^2*X2+epsilon
four_lm <- lm(Y ~ X1+ I(X1^2) + X2 + X1:X2 +(X1^2):X2)
b <- four_lm$coefficients
plot(Y ~ X1, pch=16, col="blue")
curve(beta0 + beta1*x + beta2*x^2, add=TRUE, col="darkgray", lty=2)
curve((beta0 + beta3) + (beta1 + beta4)*x + (beta2 + beta5)*x^2, add = TRUE, col = "blue", lty = 2)
curve(b[1] + b[2] * x + b[3] * x^2 + b[4]*x + b[5]*x^2, add = TRUE, col = "red")
Provide confidence intervals for each of your coefficients and briefly write about how well your fitted model captured the true model.
confint(four_lm)
## 2.5 % 97.5 %
## (Intercept) 3.7781876 6.8954591
## X1 0.5723207 12.5742761
## I(X1^2) -6.7801162 3.1729996
## X2 3.2323642 8.3218201
## X1:X2 -6.2595706 0.2787282
four_lm$coefficients
## (Intercept) X1 I(X1^2) X2 X1:X2
## 5.336823 6.573298 -1.803558 5.777092 -2.990421
The confidence intervals are very close together this shows that at 95% we can say the true mean lies between 6.618769 and 8.9885841
How does your choice of \(\sigma\) impact the results of the regression? (You may have to change \(\sigma\) and re-run the code a few times to see what happens.)
The bigger the sigma is the closer the fitted line is to the regression line.
What effect does each coefficient \(\beta_0\), \(\beta_1\), \(\beta_2\), and \(\beta_3\) have on the regression model? Change each coefficient one at a time and re-run the code to get a feel for what happens.
Beta0 moves the fitted line farther away, beta1 moved it to a negative slope, beta2 moved it to a very aggressive positive slope that wasn’t even close to the true line, beta4 made it more perpendicular than closely related.
Below is shown a scatterplot and fitted regression function \(\hat{Y}\) for the three-dimensional regression model:
\[ Y_i = \beta_0 + \beta_1 X_{i1} + \beta_2 X_{i2} + \epsilon_i \quad \text{(3D Model)} \]
where \(\epsilon_i \sim N(0,\sigma^2)\).
## Hint: library(car) has a scatterplot 3d function which is simple to use
# but the code should only be run in your console, not knit.
## library(car)
## scatter3d(Y ~ X1 + X2, data=yourdata)
## To embed the 3d-scatterplot inside of your html document is harder.
#library(plotly)
#library(reshape2)
#Perform the multiple regression
air_lm <- lm(Ozone ~ Temp + Month, data= airquality)
#Graph Resolution (more important for more complex shapes)
graph_reso <- 0.5
#Setup Axis
axis_x <- seq(min(airquality$Temp), max(airquality$Temp), by = graph_reso)
axis_y <- seq(min(airquality$Month), max(airquality$Month), by = graph_reso)
#Sample points
air_surface <- expand.grid(Temp = axis_x, Month = axis_y, KEEP.OUT.ATTRS=F)
air_surface$Z <- predict.lm(air_lm, newdata = air_surface)
air_surface <- acast(air_surface, Month ~ Temp, value.var = "Z") #y ~ x
#Create scatterplot
plot_ly(airquality,
x = ~Temp,
y = ~Month,
z = ~Ozone,
text = rownames(airquality),
type = "scatter3d",
mode = "markers") %>%
add_trace(z = air_surface,
x = axis_x,
y = axis_y,
type = "surface")
Provide confidence intervals for each of the coefficients of this regression model.
confint(air_lm)
## 2.5 % 97.5 %
## (Intercept) -175.905002 -103.3226380
## Temp 2.158276 3.1606658
## Month -6.743454 -0.3003718
What is the estimated value of \(\sigma\) for this model?
summary(air_lm)
##
## Call:
## lm(formula = Ozone ~ Temp + Month, data = airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.490 -14.509 0.456 11.698 120.372
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -139.614 18.318 -7.622 8.39e-12 ***
## Temp 2.659 0.253 10.513 < 2e-16 ***
## Month -3.522 1.626 -2.166 0.0324 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.34 on 113 degrees of freedom
## (37 observations deleted due to missingness)
## Multiple R-squared: 0.5081, Adjusted R-squared: 0.4994
## F-statistic: 58.37 on 2 and 113 DF, p-value: < 2.2e-16
sqrt(23.34)
## [1] 4.831149
Do you think this regression model fits the data well?
Yes
How would the shape of the regression model change if the values of \(\beta_0\), \(\beta_1\), and \(\beta_2\) were each changed separately?
beta0 would change how close it was to the regression line, beta1 would change the slope of the line and beta2 would change which direction the line is going.
Fabricate a dataset using a regression model and normally distributed errors that generates a picture almost identical to the scatterplot in Application Problem 1.
Consider the scatterplot shown here of a single residence’s monthly gas bill according to the month of the year. See ?Utilities for more details on the data.
model <- lm(month~gasbill, data = Utilities)
summary(model)
##
## Call:
## lm(formula = month ~ gasbill, data = Utilities)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3882 -2.8431 -0.4433 1.8173 7.3545
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.540412 0.488582 15.433 <2e-16 ***
## gasbill -0.012017 0.004661 -2.578 0.0112 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.34 on 115 degrees of freedom
## Multiple R-squared: 0.05465, Adjusted R-squared: 0.04643
## F-statistic: 6.648 on 1 and 115 DF, p-value: 0.01119
plot(gasbill ~ month, data=Utilities, main="Single Residence in Minnisota", xlab="Month of the Year", ylab="Monthly Gas Bill (US Dollars)")
abline(model)
Add the estimated regression function to the scatterplot above and state the function below.
\[ \hat{Y}_i = \beta_0 + \beta_1 X_{i} + \beta_2 X_{i}^2 + \epsilon_i \quad \text{(Quadratic Model)} \]
Diagnose the appropriateness of this regression model. How well does it fit the data?
The regression model doesn’t fit with this data. There isn’t equal variance. The data isn’t normally distrubted.
Be sure to provide diagnostic plots and supporting arguments for your claims.
plot(model)
What range of possible gas bill amounts do you predict for the September bill? How confident are you in your prediction?
predict(model, data.frame(gasbill = 9), interval = "prediction")
## fit lwr upr
## 1 7.432259 0.7554471 14.10907
The gas bill will range between 0.7554471 and 14.10907 with it most likely being 7.432259. I am about 95% confident about this prediciton.
View the mtcars dataset and corresponding help file ?mtcars.
Perform a regression that predicts the miles per gallon mpg of the vehicle based on the quarter mile time qsec and transmission type am of the vehicle.
Plot the data and your fitted regression model.
cars_lm <- lm(mpg ~ qsec * am, data = mtcars)
summary(cars_lm)
##
## Call:
## lm(formula = mpg ~ qsec * am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4551 -1.4331 0.1918 2.2493 7.2773
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.0099 8.2179 -1.096 0.28226
## qsec 1.4385 0.4500 3.197 0.00343 **
## am -14.5107 12.4812 -1.163 0.25481
## qsec:am 1.3214 0.7017 1.883 0.07012 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.343 on 28 degrees of freedom
## Multiple R-squared: 0.722, Adjusted R-squared: 0.6923
## F-statistic: 24.24 on 3 and 28 DF, p-value: 6.129e-08
coef <- cars_lm$coefficients
plot(mpg ~ qsec, data = mtcars, pch=16, col=as.factor(am))
#abline(cars_lm)
curve(coef[1] + coef[2]*x, add=TRUE, col= "red", lty=2)
curve((coef[1] + coef[3]) + (coef[2]+coef[4])*x, add = TRUE, col = "blue", lty = 2)
State the fitted regression model.
\[ \hat{Y}_i = -9.0099 + 1.4385X{i}{1} + 14.5107X{i}{2} + 1.3214X{i}{1}X{i}{2} + \epsilon_i \\ \]
Perform an appropriate test to determine if the interaction term is needed in this regression model.
summary(cars_lm)
##
## Call:
## lm(formula = mpg ~ qsec * am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4551 -1.4331 0.1918 2.2493 7.2773
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.0099 8.2179 -1.096 0.28226
## qsec 1.4385 0.4500 3.197 0.00343 **
## am -14.5107 12.4812 -1.163 0.25481
## qsec:am 1.3214 0.7017 1.883 0.07012 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.343 on 28 degrees of freedom
## Multiple R-squared: 0.722, Adjusted R-squared: 0.6923
## F-statistic: 24.24 on 3 and 28 DF, p-value: 6.129e-08
looking at the p-value the interaction is not significant.
Diagnose the appropriateness of this regression model. How well does it fit the data?
Be sure to provide diagnostic plots and supporting arguments for your claims.
plot(cars_lm)
The regression model does fit with this data. There is equal variance. The data is normally distrubted.
View the mtcars dataset and corresponding help file ?mtcars.
Perform a regression that predicts the quarter mile time qsec of the vehicle based on the displacement of the engine disp and transmission type am of the vehicle.
Plot the data and your fitted regression model.
mt_lm <- lm(qsec ~ disp + I(disp^2), data = mtcars)
summary(mt_lm)
##
## Call:
## lm(formula = qsec ~ disp + I(disp^2), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8015 -1.0945 0.1651 0.7613 4.5488
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.053e+01 1.269e+00 16.170 4.76e-16 ***
## disp -1.895e-02 1.165e-02 -1.626 0.115
## I(disp^2) 2.488e-05 2.236e-05 1.112 0.275
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.63 on 29 degrees of freedom
## Multiple R-squared: 0.2213, Adjusted R-squared: 0.1676
## F-statistic: 4.121 on 2 and 29 DF, p-value: 0.02659
b <- mt_lm$coefficients
plot(qsec ~ disp, data = mtcars, col = as.factor(am))
curve(b[1] + b[2]*x + b[3]*x^2, add = TRUE, col = "red", lty = 2)
State the fitted regression model.
\[ \hat{Y}_i = 2.052575e+01 + -1.894663e-02X{i} + 2.487614e-.05X{i}^2 + \epsilon_i \\ \]
Perform appropriate tests to determine which interaction terms are needed in this regression model.
summary(mt_lm)
##
## Call:
## lm(formula = qsec ~ disp + I(disp^2), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8015 -1.0945 0.1651 0.7613 4.5488
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.053e+01 1.269e+00 16.170 4.76e-16 ***
## disp -1.895e-02 1.165e-02 -1.626 0.115
## I(disp^2) 2.488e-05 2.236e-05 1.112 0.275
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.63 on 29 degrees of freedom
## Multiple R-squared: 0.2213, Adjusted R-squared: 0.1676
## F-statistic: 4.121 on 2 and 29 DF, p-value: 0.02659
The interaction term that is needed is the transmission style.
Diagnose the appropriateness of this regression model. How well does it fit the data?
Be sure to provide diagnostic plots and supporting arguments for your claims.
plot(mt_lm)
qqPlot(mt_lm)
## Merc 230 Ferrari Dino
## 9 30
bptest(mt_lm)
##
## studentized Breusch-Pagan test
##
## data: mt_lm
## BP = 2.0691, df = 2, p-value = 0.3554
This model fits very well to the data. The variance is equal and the data is normally distrubuted. The Breusch Pagan test accepts the null of fit.
View the mtcars dataset and corresponding help file ?mtcars.
Create a meaningful 3-dimensional regression and scatterplot of your own choosing.
Plot the data and your fitted regression model.
#I have my 3D plot but couldn't get it in the markdown will upload it with everything.
last_lm <- lm(mpg ~ cyl * hp, data = mtcars)
summary(last_lm)
##
## Call:
## lm(formula = mpg ~ cyl * hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.778 -1.969 -0.228 1.403 6.491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 50.751207 6.511686 7.794 1.72e-08 ***
## cyl -4.119140 0.988229 -4.168 0.000267 ***
## hp -0.170680 0.069102 -2.470 0.019870 *
## cyl:hp 0.019737 0.008811 2.240 0.033202 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.974 on 28 degrees of freedom
## Multiple R-squared: 0.7801, Adjusted R-squared: 0.7566
## F-statistic: 33.11 on 3 and 28 DF, p-value: 2.386e-09
State the fitted regression model.
\[ \hat{Y}_i = mpg + cylX{i}{1} + hpX{i}{2} + cyl:hpX{i}{1}X{i}{2} + \epsilon_i \\ \]
Provide 95% confidence intervals for each parameter in your regression model. Discuss what they show about your model?
confint(last_lm)
## 2.5 % 97.5 %
## (Intercept) 37.412623841 64.08979047
## cyl -6.143435023 -2.09484401
## hp -0.312228221 -0.02913198
## cyl:hp 0.001689154 0.03778566
The confidence interval shows that the cylinder and the horse power are appropriate variables for this model in determining miles per gallon. This shows the interaction is very useful in determining miles per gallon and the confidence interval is very close which means if we did a predict we could get very close to the actual miles per gallon.
Diagnose the appropriateness of this regression model. How well does it fit the data?
Be sure to provide diagnostic plots and supporting arguments for your claims.
qqPlot(last_lm)
## Toyota Corolla Lotus Europa
## 20 28
bptest(last_lm)
##
## studentized Breusch-Pagan test
##
## data: last_lm
## BP = 3.9289, df = 3, p-value = 0.2692
We accept the null that the data is fit for linear regression.
View the Births78 dataset and corresponding help file ?Births78.
plot(births ~ day_of_year, data=Births78, main="1978 Daily Birth Totals", xlab="Day of the Year", ylab="Number of Births Recorded")
Fit an appropriate regression model to this data and draw the fitted model on the plot above.
Note that while we have considered a few basic regression models, there are infinitely many models that can be created. The quadratic model can be called a “second order model.” This hints that there could be third, fourth, fifth and so on order models as well, which is true.
State the fitted regression model.
\[ \hat{Y}_i = \]
Provide a 95% confidence interval for the estimated difference between the averages of the two distinct patterns shown in the data.
Diagnose the appropriateness of this regression model. How well does it fit the data?
Be sure to provide diagnostic plots and supporting arguments for your claims.